1.Introduction

Health is directly related to health care , health behaviors and family health history. However, health is also heavily influenced by socioeconomic position, race-ethnicity, occupation, and social cohesion. Inequities in health could reflect social and economic conditions according to communities. Life expectancy is a measure that summarizes health over the entire lifespan. Life expectancy at birth is the average number of years a newborn can expect to live, assuming she or he experiences the currently prevailing rates of death throughout her or his lifespan. Since it’s intuitive to compare life expectancy between different communities, people often use this as a fator to reflect health condition. ???—describe Baltimore life expectancy

The inequality in Baltimore has been a big issue:

“Roland Park would be the 4th longest-living country in the world, while Seton Hill would be the 230th. Fourteen Baltimore neighborhoods have lower life expectancies than North Korea. Eight are doing worse than Syria.” (Christopher Ingraham, The Washington Post, 2015)

In this study, we focus on life expectancy on block level in Baltimore City. Our goal is to develop a model for predicting life expectancy in Baltimore down to single block resolution with estimates of uncertainty.

2.Method

2.1 Definition

For geographic information about blocks in Baltimore, we use the Tiger shape files from Census 2010, from which we obtained block ID, coordinates and footprint for each block. There are 13488 blocks defined by Census 2010, and we use the polygon center of each block (black dots in Fig1) to calculate the map distance between two locations. For calculating distance, we use Google Maps API through R package “ggmap”.

Fig 1: Blocks in Fells Point

2.2 Data

Outcome variable:

We could obtain the life expectancy data in 2014 for Baltimore City on neighborhood level as the outcome variable (http://bniajfi.org/vital_signs/data_downloads/). There are 55 neighborhoods in Baltimore City, we use the definition of “CSA2010” in Census 2010 shape files to get the geographic information of thoses neighborhoods.

Fig 2: Baltimore City Life Expectancy on Neighborhood Level

independent variable:

Since life expectancy is associated with many kinds of factors such as family disease history, individual body condition, environment and socioeconomic factors. We consider those following factors as the potential predictors for the predicting model of life expectancy:

  • Demographics: Age, gender, race…

  • Socioeconomic Characteristics: Income, Employment rate, poverty rate, percent of single-parent households…

  • Education: Reading proficiency, School absenteeism, adult education attainment…

  • Community Built and Social Environment: Alcohol/tobacco store density, juvenile arrest rate, domestic violence rate…

  • Housing: Lead paint violation rate, energy cut-off rate…

  • Food Environment: Fast food density, carryout density, corner store density, supermarket proximity…

  • Health Outcomes: premature mortality, avertable deaths, top ten casuses of death…

Most of the neighborhood data are Census data downloaded from BNIA. And most of the lock level data are from datasets containing location information such as address and coordinates downloaded from Open Baltimore

From the previous section, we could see there are plenty of data resources on neighborhood level from many aspects like health, socioeconomic, public safety, and housing. However, we need to narrow down the range of potential predictors since lots of them are highly correlated such as crime rate and violent crime rate, number of demolition permits and vacant building. After several steps of selection, I choose 22 variables as the potential predictors which are shown as below:

Since health environment is highly correlated with life expectancy, I also add several variables which are representative of health enviroment: Lead Paint Violations, density of hospitals/nursing homes/shelters.

Fig 3: Relationship between life expectancy and significant features.

3 Statistical Method

We want to predict life expectancy for the 13488 blocks in Baltimore City. However, we could only obtain life expectancy data as the outcome variable on neighborhood level. Therefore, we couldn’t build up the prediction model directly on block level. In this study, we decide to use a glm ????— detail statistical method—- method to train a prediction model by using neighborhood data, then plug in the block level predictors into this model and obtain the prediction values for each block. Since it’s impossible to obtain the predictors for neighborhood level and block level on the same scale (for example, we couldn’t get the median home income for each block), we need to estimate several predictors for blocks so that we can plug in them into the prediction model built on neighborhood data.

3.1 Data Transform

Such as normalization, log tranform, spline… spline for logprotax at 7.5 8.8 , quadratic term spline for numb_crime 4300 spline pubart 17

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   57.53   71.22   74.36   75.06   78.42   99.57    1105

3.2 Model

Since the outcome variable is continuous and the number of observations are not very large (we have 55 neighbor), we could use linear model. According to the simple glm regression result (comparing the AIC) and the distribution of the outcome variable in the plots above, I decide the use the “Gamma” family on glm.

Use cross validation to evaluate different models then choose one with the best performance.

3.3 Variable Selection

Use likelihood ratio test to select suitable independent variables.

3.4 Train Model

3.5 Apply model on block level data

Question

4.Result

5.Discussion

6.Reference